#1
Welcome to the PC-ORD Advisor Wizard. The purpose of the wizard is to help you to decide how to analyze your data, based on a decision tree.  You can also use it as a self teaching tool.  We attempt to deal only with relatively common data analysis problems that you are likely to encounter with ecological community data.  You must have at least a main matrix open to run the wizard. Ready to go?
?
~
1
5, Proceed.
Use the "more info" button to get brief background or explanation of the choice that is before you. The principles underlying the decision tree are explained more fully in McCune and Grace (2002), "Analysis of Ecological Communities." The purpose of the wizard is to help get you started if you are unfamiliar with community analysis, not to attempt to deal with every possible contingency in your study design and goals. The underlying decision tree is not comprehensive, although it deals with the most common analysis problems with community data.

PC-ORD expects your data in two chunks: a main matrix, usually containing community data (sample units x species) and a second matrix, usually containing environmental variables and/or sampling design variables.  For example, if you had a two-factor experiment, you should have two columns in the second matrix, one coding the first factor, another numerically coding the second factor.  The second matrix commonly contains a mixture of categorical and quantitative variables.  All values in both matrices, apart from the row and column names, must be numeric.

Some people confuse rows and columns. Remember that columns are vertical -- picture the columns of the Parthenon!

A matrix is simply a table of numbers.

#5
Where would you like to start?
?
~67
2
2000, Data adjustments (transformations, relativizations)
10, Skip to analytical procedures
Either route should lead to the same conclusion, so this is just personal preference.  Or you may wish to just explore one part of the decision tree. 

Although data adjustments precede analytical procedures, some analytical procedures (linear models and those based on correspondence analysis) have built-in data adjustments that greatly reduce your options.  Distance-based analyses, on the other hand, tend not to have built-in data adjustments.  

#10
Do you wish to compare existing groups in your data?
?5001
~188
2
20, Yes
400, No, look for structure in my data without reference to groups.
Choose "yes" if you wish to compare two or more existing groups, such as treatment vs. control, different values of a categorical variable in your second matrix, or two matrices, each representing different groups of variables from the same set of sample units.

Choose "no" if you wish to ordinate your data or seek groups.

#20
Which kind of comparison do you want to make?  You can compare (A) groups of sample units in your main matrix, the groups specified by a categorical variable in your second matrix.  Or you can compare (B) two groups of variables from the same sample units, one group of variables in the main matrix and the other group in the second matrix.
?5001
~185
2
200, A. Compare groups of sample units (by far the more common goal).
30, B. Compare groups of variables.
This question is asking you to state a goal for your analysis, specifically what kind of comparisons you want to make.  Do you want to compare groups of sample units or groups of variables?  You might want to make both kinds of comparisons; if so, just choose one for now.

Think of "groups of sample units" as slicing your matrix horizontally. This leads to traditional comparisons of groups, analagous to t-tests or analysis of variance.

Think of "groups of variables" as having sliced a matrix vertically, with one part in your main matrix and one part in your second matrix.  For example, say we measured a set of variables representing microbial activity in the surface organic soil horizon, and the same variables in the mineral soil horizons.  One might want to examine and test for relationships between the organic and mineral horizons.  If there is no relationship between those two sets of variables, then they should be analyzed as separate matrices (option 3 below), rather than as a combined matrix (options 1 or 2 below).

Note that in the above example there are four ways that the data could be organized.  Say we have 3 sites, 2 soil horizons, and 3 microbial activity variables.  The four arrangements are:

1. 3 sites x 6 columns (horizon-activity combinations).  Each column represents a microbial activity variable for a particular horizon.  This option makes sense if you wish to compare sites, integrating the information from the two horizons, while preserving the ability to see how activities in different horizons are related.

2. 6 rows (site-horizon combinations) x 3 activity variables.  The values in the matrix are the same as for arrangement 1, except that the right 3 x 3 half of the matrix is detached, then placed under the left 3 x 3 half, giving a 6 x 3 matrix.  This option makes sense if you wish to compare horizons and to compare sites.

3. Two separate 3 x 3 matrices, each matrix having 3 sites x 3 activity variables.  One matrix is for the organic horizon and one matrix is for the mineral horizon.  This option makes sense if the correlation structure among activities is very different between horizons, and you wish to study them independently.

4. One matrix of 3 sites x 3 activity variables, each element (cell in the matrix) representing the average activity between the organic and mineral layers.  This option makes sense only if there is very little difference between layers.

##30
A Mantel test is recommended.  You must have both a main and second matrix open.  If so, and if you wish to run it now, click Run.  If not, you can exit the Advisor, open the matrices, then run the Mantel test from the Groups menu.
?5034
~211
1
@76 Mantel
A Mantel test evaluates the null hypothesis of no relationship between two distance matrices.  The matrices can be calculated by PC-ORD or they can be read directly as distance matrices. 

For example, say we are interested in whether two sets of species, birds and plants, show related patterns from the same set of sample units (SUs).  We set up two matrices, one with SUs x bird species, the second with SUs x plant species.  We can then use a Mantel test of no relationship between birds and plants.  A distance matrix is calculated from each matrix, and the null hypothesis evaluates no relationship between the two matrices.

The Mantel test can also be used to asses the relationship between geographic distance among populations and genetic difference among populations. Many other applications are possible (see McCune & Grace 2002).

#40
What kind of variables are in your main matrix?
?7564
~35
3
70, Species abundances or presence absence
60, Non-community data with approximately linear relationships
75, Other non-community data with potentially nonlinear relationships
If your variables have more-or-less linear relationships to each other, then traditional linear models will be powerful, appropriate, and informative.  

Ecological community data (species abundance or presence in a sample) typically have grossly nonlinear relationships and the variables are distributed far from normality. Instead, community data typically form a dust-bunny distribution of sample units in species space.  For more information on dust bunnies, see McCune and Grace (2002).

Examples of non-community data:
  - species x life history characteristics
  - individuals x behavioral activities
  - populations x demographic parameters
  - sites x climatic variables

Some of these non-community examples can have nonlinear relationships among the variables as well.

#60
Which question best describes your problem?  
A. How can I combine variables in the main matrix to best predict group membership?  How well can I do this?
B. How different are the groups?
?7608
~205
2
220, How can I combine variables...?
230, How different are the groups?
Multivariate analysis of variance (MANOVA) and discriminant analysis (DA) have the same mathematical core, but the results are expressed differently.  With MANOVA, the emphasis is on if and how much the groups differ.  With DA, the emphasis is on seeking variables that best separate the groups.  Furthermore MANOVA allows much more complex designs (such as factorial, nested, repeated measures, and blocking), while DA uses a one-way classification, with two or more groups.

#70
Do you wish to compare species composition between groups or individual species between groups?  
?5026
~198
2
75, Species composition
72, Individual species, one at a time
Some methods of comparing groups, such as MRPP and perMANOVA, are multivariate -- they evaluate group differences in a number of response variables (species in this case) simultaneously.  On the other hand, Indicator Species Analysis is a useful tool for comparing individual species between groups, evaluating species one at a time, but presenting the results in a single table.  If you wish to test the hypothesis of no difference in species composition among groups, select the first choice.  If you wish to test the hypothesis of no difference in individual species among groups, taking species one at a time, select the second choice.


#72
Consider Indicator Species Analysis. Which design best matches your problem?
?5026
~198
2
73, Simple one-way classification (single grouping variable or factor)
74, One-way classification with blocks or pairing.
To use Indicator Species Analysis, you need a single grouping variable and data on species abundance or presence-absence. Usually the grouping variable is a categorical variable in your second matrix. In addition, you may have a categorical variable in your second matrix that defines blocks in your data. Blocks can be spatial, as in a traditional randomized complete block (RCB) design, or they can be temporal, as in a repeated measures design. However only one grouping variable can be evaluated at a time, apart from the blocks. As in a traditional RCB design, each block must have only one instance of each treatment (or group), and the design must be balanced. Blocks cannot have only one item.

##73
Consider Indicator Species Analysis.
?5026
~198
1
@275 ISA
To use Indicator Species Analysis, you need a single grouping variable and data on species abundance or presence-absence.  Usually the grouping variable is a categorical variable in your second matrix.  Indicator Species Analysis compares each species among groups, one species at a time.  The analysis is based on a combination of concentration of abundance into particular groups and faithfulness of presence in particular groups.


##74
Consider Blocked Indicator Species Analysis.
?16001
~198
1
@78 Blocked ISA
Indicator Species Analysis can be adapted to a randomized block experiment or a paired-sample design by pre-relativizing by species within blocks (or pairs), such that the sum across groups equals one for each block. If a species is absent from a block, the abundances are maintained at zero. The relativization alters the relative abundance portion of the Indicator Value (IV) index to focus on within block differences. Then the ISA is run as usual. The randomization test is adjusted accordingly. Blocks can be spatial, as in a traditional randomized complete block (RCB) design, or they can be temporal, as in a repeated measures design. However only one grouping variable can be evaluated at a time, apart from the blocks. As in a traditional RCB design, each block must have only one instance of each treatment (or group), and the design must be balanced. Blocks cannot have only one item. In PC-ORD, blocks and groups are defined by a categorical variables in the second matrix.


#75
Which design best matches your problem?
?1003
~188
4
80, Simple one-way classification (single grouping variable or factor)
90, One-way classification with blocks or pairing.
100, Same plots at two or more dates.
110, Combinations of the above or other designs with two or more factors.
A one-way classification can be represented by a single categorical variable.  For example, the variable "Treatment" might be coded as 0=control and 1=treated.

Blocking: randomized block designs and paired-sample data.  Each block (or pair) has one sample unit for each treatment.  Usually blocks are individual sites or plots with two or more treatments.

Factor:  A classification (grouping variable) that assigns each observation to one level of the classification.  In PC-ORD, the factors are chosen from individual columns in the second matrix.

Simple repeated measures designs (e.g. a set of plots sampled at a series of dates) can also be analyzed as blocks, where the goal is to detect differences through time.

#80
Do you have a balanced design?
?1019
~
2
83, Yes.  Each group has an equal number of sample units (rows).
86, No.  The groups differ in number of sample units.
Balanced:  Every cell of the design has an equal number of observations (data points).

Cell:  A subset of the data occurring at the intersection of one level of every factor being considered.  Every data point in a particular design belongs to one and only one cell of the design.

Factor:  A classification (grouping variable) that assigns each observation to one level of the classification.  In PC-ORD, the factors are chosen from individual columns in the second matrix.

#83
Consider using MRPP or perMANOVA (a.k.a. NPMANOVA).  Both are permutation-based techniques that do not depend on distributional assumptions.  For one-way balanced designs we have little basis for choosing between them.  Select the variable that defines the groups as the "grouping variable."  With MRPP, use the A-statistic as a measure of "effect size" and the p value to evaluate statistical significance. With perMANOVA a traditional ANOVA table is produced.
?1003
~188
2
84, Try PerMANOVA
85, Try MRPP
To use MRPP or perMANOVA you need to define a categorical grouping variable in your second matrix.  The variable must be coded numerically, for example:
	Burn
	1
	1
	1
	2
	2
	2
	3
	3
	3
	3

##84
Consider using perMANOVA (NPMANOVA).
?1001
~196
1
@77 PerMANOVA
To use perMANOVA you need to define a categorical grouping variable in your second matrix.  The variable must be coded numerically, for example:
	Burn
	1
	1
	1
	2
	2
	2
	3
	3
	3
	3
##85
Consider using MRPP.
?5011
~188
1
@272 MRPP from second matrix
To use MRPP you need to define a categorical grouping variable in your second matrix.  The variable must be coded numerically, for example:
	Burn
	1
	1
	1
	2
	2
	2
	3
	3
	3
	3

##86
MRPP is recommended for unbalanced 1-way classifications.
?5011
~188
1
@272 MRPP from second matrix
To use MRPP you need to define a categorical grouping variable in your second matrix.  The variable must be coded numerically, for example:
	Burn
	1
	1
	1
	2
	2
	2
	3
	3
	3
	3


#90
Consider using blocked MRPP or perMANOVA. Do you wish to use a non-Euclidean distance measure, such as Sorensen (Bray-Curtis) as is often appropriate for community data?
?8501
~193
2
93, Use Euclidean distance measure
96, Use a non-Euclidean distance measure
Blocked MRPP requires the use of Euclidean distance.  PerMANOVA can be applied to a wide range of distance measures, including Sorensen (Bray-Curtis) distances, which are popular and effective for ecological community data.

#93
Consider using blocked MRPP or perMANOVA.  Both are applicable to this situation, and in this case we have little basis for choosing between them. Both require one instance of each group in each block.  Select a grouping variable (which can be time, or treatment, etc.) and a blocking variable.  To do this you need these two variables in your second matrix: a categorical variable coding for the grouping variable and a categorical variable coding for block number.
?5037
~193
2
96, Try PerMANOVA
94, Try blocked MRPP
To use perMANOVA for a blocked design you need two variables in your second matrix: a categorical variable coding for your grouping variable (for example, one of three thinning treatments), and a categorical variable coding for block.  The variables must be coded numerically, one instance of each group for each block. For example if we have three thinning treatments, each at a number of sites, sites are blocks and thinning is the grouping variable:
	Site	Thinned
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3

##94
Consider using blocked MRPP.  This requires one instance of each group in each block.  Select a grouping variable (which can be time, or treatment, etc.) and a blocking variable.  To do this you need these two variables in your second matrix: a categorical variable coding for the grouping variable and a categorical variable coding for block number.
?5037
~193
1
@273 blocked MRPP
To use blocked MRPP you need two variables in your second matrix: a categorical variable coding for your grouping variable (for example, one of three thinning treatments), and a categorical variable coding for block.  The variables must be coded numerically, one instance of each group for each block. For example if we have three thinning treatments, each at a number of sites, sites are blocks and thinning is the grouping variable:
	Site	Thinned
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3


##96
Consider using a blocked design with perMANOVA.  This requires one instance of each group in each block.  Select a grouping variable (which can be time, or treatment, etc.) and a blocking variable.  To do this you need these two variables in your second matrix: a categorical variable coding for the grouping variable and a categorical variable coding for block number.
?1003
~196
1
@77 PerMANOVA
To use perMANOVA you need two variables in your second matrix: a categorical variable coding for your grouping variable (for example, one of three thinning treatments), and a categorical variable coding for block.  The variables must be coded numerically, one instance of each group for each block. For example if we have three thinning treatments, each at a number of sites, sites are blocks and thinning is the grouping variable:
	Site	Thinned
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3

#100
Consider using blocked MRPP or perMANOVA. Do you wish to use a non-Euclidean distance measure, such as Sorensen (Bray-Curtis) as is often appropriate for community data?
?8501
~193
2
103, Use Euclidean distance measure
106, Use a non-Euclidean distance measure
Blocked MRPP requires the use of Euclidean distance.  PerMANOVA can be applied to a wide range of distance measures, including Sorensen (Bray-Curtis) distances, which are popular and effective for ecological community data.

#103
Consider using blocked MRPP or perMANOVA with time as the grouping variable and sample unit as the blocks. Both are applicable to this situation, and in this case we have little basis for choosing between them. Both require one instance of each group in each block.  Select time as your grouping variable and sample unit as the blocking variable.  
?5037
~193
2
105, Try blocked MRPP
106, Try perMANOVA
To use blocked MRPP or perMANOVA with time as the grouping variable and sample unit as blocks, you need these two variables in your second matrix: a categorical variable coding for discrete sample dates (or rounds of sampling) and a categorical variable coding for sample unit number. The variables must be coded numerically, for example:
	Date   Plot
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3

##105
Consider using blocked MRPP with time as the grouping variable and sample unit as the blocks.
?5037
~193
1
@273 blocked MRPP from 2nd matrix
To use blocked MRPP with time as the grouping variable and sample unit as blocks, you need these two variables in your second matrix: a categorical variable coding for discrete sample dates (or rounds of sampling) and a categorical variable coding for sample unit number.  The variables must be coded numerically, for example:
	Date   Plot
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3

##106
Consider using perMANOVA with time as the grouping variable and sample unit as the blocks.
?1001
~196
1
@77 PerMANOVA
To use perMANOVA with time as the grouping variable and sample unit as blocks, you need these two variables in your second matrix: a categorical variable coding for discrete sample dates (or rounds of sampling) and a categorical variable coding for sample unit number.  The variables must be coded numerically, for example:
	Date   Plot
	1	1
	1	2
	1	3
	2	1
	2	2
	2	3

#110
Which design best matches your problem?
?1003
~218
4
120, Two-way factorial
130, One fixed factor and one level nested (Mixed Model)
140, Two levels nested (Model II)
150, More than two factors

Factor:  A classification that assigns each observation to one level of the classification.  In PC-ORD, the factors are chosen from individual columns in the second matrix.

Fixed effect (or fixed factor):   An effect attributable to a factor where levels are set to fixed, prescribed values, those values chosen because we are interested in them.  A fixed model has only fixed effects.

Random effect (or random factor):  An effect attributable to a factor where the levels are selected at random from a potentially infinite set of levels.  For example, a subsample of 5 subplots from a plot is selected from a large number of possible subplots.

Two-way factorial:  Observations are replicated within combinations of two factors (grouping variables).  These two factors are selected as grouping variables.  Interaction terms are calculated.  The factors are assumed to have fixed effects. (EXAMPLE BELOW)

One fixed factor and one level nested (Mixed Model):  observations are replicated within one factor which is nested within a fixed factor.  (EXAMPLE BELOW)

Mixed model:  A model combining fixed and random effects.

Nested:  One factor is nested within another when groups are divided into subgroups.  The nested factor is usually considered to be a random effect.  Nested designs are usually produced by a subsampling procedure:  sample units are divided into two or more subsamples. 

Two levels nested (Model II):  Observations are replicated within one factor which is nested within a second factor.  Both factors are assumed to be random effects. (EXAMPLE BELOW)

EXAMPLE OF 2-WAY FACTORIAL DESIGN
Forest floor plots were given more water vs. not (Factor1=water) and more nitrogen vs. not (Factor2=nitrogen).  Each factor has 2 levels, so there are 2 x 2 = 4 cells in the design.  Each combination of nitrogen and water was replicated three times.  Abundance was recorded for 10 species at one date, one year after the experiment started.  The main matrix thus has 12 rows x 10 columns.  The second matrix has 12 rows x 2 columns (water, nitrogen).

EXAMPLE OF 1 FIXED AND 1 RANDOM FACTOR (MIXED MODEL)
Four of 8 forest floor plots were given more water while 4 were not (Factor1=water).  Each plot was subsampled with 3 subplots (Factor2=subplot). So there are 2 x 3 = 6 cells in the design and 4 replicates per cell, yielding 6 cells x 4 replicates = 24 sets of observations.  Water has a fixed effect and subplot has a random effect.  Abundance was recorded for 10 species at one date, one year after the experiment started.  The main matrix thus has 24 rows x 10 columns.  The second matrix has 24 rows x 2 columns (water, subplot).

EXAMPLE OF 2 LEVELS NESTED
Four sites were visited in each of 5 watersheds (factor1=watershed, factor2=site).  Each site was subsampled with 3 plots (replicates). So there are 4 x 5 = 20 cells in the design and 3 replicates per cell, yielding 20 cells x 3 replicates = 60 sets of observations.  Both watershed and site are considered random effects.  Abundance was recorded for 10 species.  The main matrix thus has 60 rows x 10 columns.  The second matrix has 60 rows x 2 columns (watershed, site).



##120
Use perMANOVA, selecting a two-way factorial design. 
?1001
~197
1
@77 PerMANOVA
Two-way factorial:  Observations are replicated within combinations of two factors.  These two factors are selected as grouping variables.  Interaction terms are calculated.  The factors are assumed to have fixed effects.

##130
Use perMANOVA, selecting a mixed model.
?1001
~218
1
@77 PerMANOVA
One fixed factor and one level nested (Mixed Model):  observations are replicated within one factor which is nested within a fixed factor.

##140
Use perMANOVA, selecting a two-factor nested design.
?1001
~218
1
@77 PerMANOVA
Two levels nested (Model II):  Observations are replicated within one factor which is nested within a second factor.  Both factors are assumed to be random effects.

##150
Your full design is too complex to be analyzed with perMANOVA or SumF in PC-ORD.  Consider ways to simplify your design:  are there any factors in your design that have little effect and can be ignored?  Can you analyze each round of sampling separately?  Can you collapse a nested design by averaging lower levels up to the level of your treatments?
?1001
~218
1
1, go back to start
Marti Anderson in New Zealand offers free downloads from her website of a program for perMANOVA that allows more complex designs.  This software has an interface and that is more difficult to use than PC-ORD, but offers more design options.

#200
Do you seek a graphical representation of the groups or a measure of the size of the difference, along with a hypothesis test of no difference?
?
~
2
500, Graphical representation
40, Measure size of difference and test hypothesis
Methods for evaluating group differences, such as MRPP, ANOVA, Indicator Species Analysis, do not usually have built-in graphical representations of those differences. If you wish to visually assess the multivariate differences between groups, ordination methods are useful.  In particular, you will want to use your grouping variable (normally a categorical variable in the second matrix) as an overlay on your ordination.  Different groups are shown by different symbols and/or colors.


##220
Consider using discriminant analysis or logistic regression.  These techniques are not available in PC-ORD but can be found in the standard statistical packages.
?7608
~205
1
1, Go back to beginning.
PC-ORD emphasizes nonparametric, distribution-free methods and methods developed for ecological community data.  For standard linear parametric statistics, please use your favorite traditional statistical package.


##230
Consider using multivariate analysis of variance.  This technique is not available in PC-ORD but can be found in the standard statistical packages.
?7584
~184
1
1, Go back to beginning.
PC-ORD emphasizes nonparametric, distribution-free methods and methods developed for ecological community data.  For standard linear parametric statistics, please use your favorite traditional statistical package.


#400
Do you wish to seek and define groups of sample units in your data?
?5002
~80
2
405, Yes.
500, No, formal groups are not necessary.
Finding groups in your data is known generically as cluster analysis.  The many methods of cluster analysis all seek to define groups in your data by analyzing dissimilarities.  

Of course there is nothing to prevent you from identifying and naming groups from graphical representations of similarity relationships (i.e. ordination). 

#405
Do you have an extremely large data set with, for example, 10,000 or more sample units?
?
~84
2
407, Yes
410, No
Many methods of clustering increase in memory demand and computational time with the square of the number of rows in the matrix.  Such "N-squared" routines become impossibly slow and memory-consuming with very large data sets, such as classifying pixels in satellite imagery.  Other methods, such as "K-means", increase computational demands only linearly with the number of rows.  Such methods are practical for clustering very large data sets.

##407
A non-hierarchical clustering method, such as "k-means" is recommended.  This is not available in PC-ORD, but can be found in GIS software.
?
~84
1
1, Start over.


#410
Do you wish to display row and column dendrograms in relationship to each other in a graphical view of the main matrix?
?5047
~
2
415, Yes
430, No, just a single dendrogram.
You can construct a single dendrogram (tree diagram) representing relationships among your rows in your main matrix, or you can simultaneously display two dendrograms, one representing relationships among rows, the other representing relationships among columns.  

The purpose of our two-way clustering (also known as biclustering) is to graphically expose the relationship between cluster analyses and your individual data points.  The resulting graph makes it easy to see similarities and differences between rows in the same group, rows in different groups, columns in the same group, and columns in different groups.  You can see graphically how groups of rows and columns relate to each other.

#415
Are ALL of the following true?
- main matrix contains species abundances or presence-absence
- groups are sought along only one strong underlying gradient (or underlying dimension)
- ok to implicitly use chi-square distance measure to represent differences between sample units
- ok to represent abundances as "pseudospecies"
?5022
~97
2
420, Yes
425, No
Even though TWINSPAN has been a popular method, it has a large number of restrictions that limit its sensible use.  The list of conditions in this question attempt to capture the essence of the limitations of TWINSPAN.

##420
Two-way indicator species analysis is recommended.
?5022
~97
1
@74 TWINSPAN
TWINSPAN simultaneously classifies species and samples.  The program is geared towards ecological data.  At its core, TWINSPAN is based on dividing a reciprocal averaging ordination space.  

One of the most useful features of TWINSPAN is the final ordered two-way table.  Species names are arrayed along the left side of the table, while sample numbers are along the top.  The pattern of zeros and ones on the right and bottom sides define the dendrogram of the classifications of species and samples, respectively.  The interior of the table contains the abundance class of each species in each sample.  Abundance classes are defined by pseudospecies cut levels (defined below).

TWINSPAN inherited a number of faults from its parent method, reciprocal averaging (RA, CA, or correspondence analysis).  Like RA it performs poorly when there is more than one important underlying gradient.  This limitation is particularly awkward when trying to interpret the ordered table.  Unlike cluster analysis, which has no inherent reduction in dimensionality, TWINSPAN cannot effectively represent complex data sets in its one-dimensional framework

TWINSPAN does not analyze abundance data directly, but is based on presence/absence.  However, it approximates quantitative abundance data by creating a variable number of "pseudospecies" representing abundance classes.  The "pseudospecies cut levels" are used to define the ranges of the abundance classes.

Suppose you have data ranging from 0 to 100% cover.  You need to define cover classes from the data (maximum number of classes = 9).  The default cut levels are 0, 2, 5, 10, and 20, and are reasonable for your data in this case.  These are cutoff points for pseudospecies 1, 2, 3, 4, and 5.  A species (say its code name is Testsp) that had an abundance of 8% in a given sample would be interpreted as "present" for three pseudospecies: Testsp 1, Testsp 2, and Testsp 3 (the digit after the species name indicates the pseudospecies level).  An input value of 1.5% would result in only one pseudospecies being "present", Testsp 1.

Selection of appropriate cut levels of pseudospecies is important for retaining the quantitative nature of your data. For example, if you had relativized your data such that all of the values fell between 0 and 1, then use of the default cut levels would result in your data being treated as presence/absence. This would occur because only the first pseudospecies level would be used.  If you want to retain the quantitative information in your data, cut levels such as .0, .02, .05, .10, and .20 would be appropriate.

##425
Two-way cluster analysis is recommended.
?5047
~
1
@71 Two-way cluster analysis
Two-way clustering refers to doing a cluster analysis on both the rows and columns of your matrix, followed by graphing the two dendrograms simultaneously, adjacent to a representation of your main matrix.  Rows and columns of your main matrix are re-ordered to match the order of items in your dendrogram. 

The purpose of our two-way clustering (also known as biclustering) is to graphically expose the relationship between cluster analyses and your individual data points.  The resulting graph makes it easy to see similarities and differences between rows in the same group, rows in different groups, columns in the same group, and columns in different groups.  You can see graphically how groups of rows and columns relate to each other.

The term "biclustering" is used frequently for analysis of patterns in microarray data.  In its strict sense, biclustering involves simultaneous row-column clustering, while "two-way clustering" uses one-way clustering on each of the two dimensions of the data matrix separately (Madeira & Oliveira 2004).  Combining the two sets of clusters into a single graph can be used to produce subgroups of rows and columns that could be considered biclusters.  Madeira and Oliveira defined a bicluster as a subset of rows that exhibit similar behavior across a subset of columns, and vice-versa.  The "merit function" for defining a bicluster can be defined in many different ways, many of which make sense for microarray data but not for community data.  Furthermore the goals of community analysis vs. microarray analysis differ: in the latter we seek one or more groups of functionally linked genes that are expressed in concert.  In community analysis, on the other hand, it is unreasonable to expect hard-edged groups  if we seek groups we anticipate softer-edged fuzzy sets.  Furthermore, rather than picking out one or more non-nested biclusters from a data set, we are usually interested in assigning every sample unit and species to a hierarchical structure.  For these purposes, it appears that two-way clustering will serve community ecologists better than techniques developed for biclustering of microarray data.

##430
Hierarchical agglomerater cluster analysis is recommended.
?5002
~86
1
@70 CLUSTER ANALYSIS


#500
Are you interested in relationships in the main matrix only as they relate to the second matrix, or are you interested in relationships in the main matrix, as well as the relationship between main and second matrices?
?4016
~164
2
600, Main matrix only in relationship to second.
700, Main matrix relationships and relatonships of main to second matrix.
This is a critical question.  In the case of community data, the answer separates ordinations in species space from ordinations in environmental space.  In both cases you can relate your results to the environmental matrix, but the ordinations in environmental space ignore any community relationshps that are not simply related to your measured environmental variables. In that case, the representation of the community matrix is constrained by the environmental matrix.

An ordination in environmental space is also known as "direct gradient analysis" or "direct ordinatinon."  An ordination in species space is also known as "indirect gradient analysis" or "indirect ordination."  This is considered indirect, because the strongest community relationships are extracted first, then secondarily related to measured environmental variables.

Indirect ordination is a more general-purpose technique, because it detects patterns in your species data regardless of whether or not you have measured appropriate and useful environmental variables.  Direct ordination is most useful when you wish to test an hypothesis about whether a specific continuous environmental variable is related to community structure. For more on this topic, see pages 102-103 and 164-165 in McCune and Grace (2002).

#600
Do you wish to model variables in the main matrix one at a time in relationship to variables in the second matrix (univariate response), or do you wish to extract the dominant correlation structure in the main matrix in relationship to the second (multivariate response)?
?2008
~11
2
610, One at time (univariate response)
620, Simultaneous (multivariate response)
If you have community data (sample units x species in the main matrix), this question can be rephrased as whether you want to represent single species to environment (or other variables in your second matrix) or the community in relationship to environment.

##610
Regress each variable in your main matrix against the variables in your second matrix.  For species data we recommend nonparametric multiplicative regression (NPMR) in HyperNiche (see www.hyperniche.com). This approach allows nonlinear responses and accomodates complex interactions among predictors.
?2008
~11
1
1, Start over.

#620
Which form of relationship do you anticipate between variables in your main matrix and variables in your second matrix?
?4016
~164
2
630, linear (reasonable approximated by straight lines)
640, unimodal (hump-shaped)
Choosing a linear relationship means that straight lines reasonably describe the relationship between variables in your main and second matrices.   With community data, a unimodal (hump-shaped) relationship is usually a better choice, unless you are studying a data set with low hetereogeneity (beta diversity). You can determine your beta diversity by running Advisor | Show Current Profile.  A rule of thumb for Whittaker's beta diversity: low is < 1, medium is 1-5 and high is > 5.  A rule of thumb for beta diversity measured in average half changes: low is < 1, medium is 1-2, and high is > 2.  See also McCune and Grace (2002, p. 31).

You can evaluate this by making a series of scatterplots of one species' abundance versus another (In PC-ORD, select Graph | Scatterplot or Graph | Scatterplot Matrix).  Are the relationships among species reasonably approximated by straight lines?  If so, you should consider using a linear model.

##630
Consider redundancy analysis (RDA). These methods are not available in PC-ORD, because they are not often useful with ecological community data.
?4066
~176
1
@69 RDA
Redundancy analysis (RDA) is a constrained ordination sample units in environmental (or other) space.  It assumes a linear response of variables in the main matrix to variables in the second matrix and linear relationships among the variables in the main matrix.  As in CCA, it is important for RDA to recognize that it ignores variation in the main matrix that is unrelated to the measured variables in the second matrix. Unlike CCA, RDA assumes linear relationships among all variables.


##640
Consider canononical correspondence analysis (CCA).
?4016
~164
1
@61 CCA
Canonical correspondence analysis (CCA) ordinates sample units in environmental space.  It assumes a unimodal species response to environmental factors.  The most serious drawbacks of CCA are (1) that it ignores community variation that is unrelated to the measured environmental factors, (2) it assumes that environmental factors are linearly interrelated, and (3) the positions of points in the resulting ordination are strongly influenced by noise in the environmental data.

#700
Are relationships among variables in your main matrix reasonably linear (approximated by straight lines)?
?7564
~35
2
710, Yes
720, No or don't know.
Traditional multivariate analyses are based on a linear model.  That is, we assume that relationships among variables can be reasonably approximated by straight lines.  This is generally not true for species data, except for data sets with low beta diversity (i.e. sampled from short environmental gradients or otherwise representing fairly small changes in species composition).  You can evaluate this by making a series of scatterplots of one species' abundance versus another (In PC-ORD, select Graph | Scatterplot or Graph | Scatterplot Matrix).  Are the relationships among species reasonably approximated by straight lines?  If so, you should consider using a linear model.

##710
Consider principal components analysis (PCA).
?4037
~114
1
@65 PCA
Principal components analysis (PCA) is the most basic eigenanalysis technique.  It is suitable for data where the relationships among columns (variables) are reasonably represented by straight lines.  This is not generally true for community data, but many other kinds of variables are well suited to PCA.

#720
Are you interested in a single primary gradient or underlying dimension?
?
~152
2
725, Yes
850, No
Some analytical methods perform well when there is only one strong underlying dimension.  Other methods are better at simultaneously representing two or more gradients.  If you are interested in a single factor, for example a transition from wet to dry environments, and there are no other important factors operating, then it is fine to use a method that can represent only a single gradient.

#725
Do you wish to apply previously assigned species weights to calculate a score for each sample unit?
?4055
~149
2
730, Yes
750, No
Species weights can be assigned according to some a priori principle, then used to calculate a score for each sample unit.  For example, the Federal Wetlands Manual assigns a wetland indicator value to a large number of flowering plant species.  These indicator values are then combined with weighted averaging to produce a score for a particular site, based on the abundances of indicators present at that site.

##730
Use weighted averaging (WA) to do a one-dimensional ordination.  This calculates a score for each sample unit based on weighted averaging of your community data with your a priori species scores.  To do this, you must have species scores open as a second matrix.  If this is the case, select "Run" to run weighted averaging.  Otherwise exit, then set up your matrices and run Ordination | Weighted Averaging or do it via matrix multiplication (Modify Data | Multiply Matrices)
?4055
~149
1
@67 WEIGHTED AVERAGING
PC-ORD allows weighted averaging through either Ordination | Weighted Averaging or Modify Data | Multiply Matrices.  The community matrix is opened as the main matrix, with sample units as rows and species as columns. The species weights are opened as the second matrix, with species as rows and one or more set of weights as columns.  Historically a single set of weights (e.g. moisture preferences below) is used, but there is no reason why you cannot use more than one set of weights simultaneously. 

#750
Is the chi-square distance an acceptable representation of relationships among your sample units (rows)?
?8501
~49
2
760, Yes
850, No
The correspondence analysis family implicitly uses a chi-square distance to represent relationships among your objects.  This distance measure builds in a double weighting by row and column totals.  Elements of the data matrix are divided by the row total, then the differences between rows for a given column are inversely weighted by the column total.  One consequence of this is that sample units with concentrations of infrequent species are perceived as outliers.

Note that all elements in the data must be non-negative to use a chi-square distance.

See also McCune and Grace (2002), p. 50 and Minchin (1987a) cited in McCune & Grace.

Choose "yes" to chi-square distance only if the following are all true:
  - you wish to give high weight to species with low abundance in the data.
  - you consider sample units with several low abundance species to be a feature that you wish to emphasize,
  - you do not wish to use a distance measure that reaches a constant maximal value for sample units with no species in common, and
  - your data are all non-negative values.


##760
Consider using correspondence analysis (CA or RA) or nonmetric multidimensional scaling (NMS).  Either method is likely to suffice in this case.
?4040
~125
1
@66 CA
CA (RA) suffers several faults which make detract from its utility as an ordination method.  It gives reasonable representations of only one axis, the second axis introducing an arch into the first.  Subsequent axes introducing more complex polynomials of the first axis.  Points in the center are spread too much relative to the edges.  Outliers can have a strong effect.  In general CA should be avoided with community data sets that have more than one important axis of variation.

CA can also be represented as an eigenanalysis technique where compositional dissimilarity is measured using the chi-squared distance measure and samples are weighted by their totals (Chardy et al. 1976).  In theory CA results in an ordination space in which distances between sample points are proportional to their chi-squared distance values.

The effect of the chi-squared distance measure is to give high weight to species whose total abundance in the data matrix is low.  This can result in an overemphasis of the distinctness of samples containing several rare species (Faith et al. 1987; Minchin 1987a).  Unlike proportionate city-block distance measures, chi-squared distance does not have a constant maximal value for pairs of sample units with no species in common, but fluctuates according to variations in the abundance of species with high or low total abundances (Minchin 1987a).


##850
Consider using nonmetric multidimensional scaling (NMS, NMDS).
?4028
~125
1
@63 NMS
Nonmetric multidimensional scaling (NMS) is a general-purpose ordination method that is very useful for community analysis.  You can usually obtain a good result by using the "autopilot" setting on PC-ORD, eliminating the need for choosing options other than the distance measure.  NMS has no built-in relativization, leaving this to the user as a prior data transformation.  While this gives the user more flexibility and control than some other methods with built-in relativization (e.g. PCA, CA, DCA), it comes with more responsibility for the user to understand and choose appropriate data adjustments before running the analysis.


#2000
The data adjustment questions assume that the rows of your matrices are sample units and the columns are variables.  Unless otherwise specified, the decision tree refers to your MAIN matrix. Are your variables all in the same units or on the same scale?
?3013
~70
2
2100, Yes. 
2050, No.
For example, if some of your variables in your main matrix are measured in kg/ha and some are measured as individuals/sq.m, these are on different scales and need to be made comparable to each other.  In this case you should answer "no."

Variables in the SECOND matrix are often used one at a time, for example as ordination overlays, which means that we need not consider relativizing them.  If, however, you plan to analyze your second matrix in a multivariate way, then you should consider relativizing the second matrix also.  For example, if you want to run a Mantel test comparing a distance matrix based on the main matrix with a distance matrix based on the second matrix, then you must consider how variables are scaled relative to each other within the same matrix.

#2050
Your variables need to be relativized.
?3013
~70
1
2550, Continue.
Either as part of the data analysis procedure or as a prior data adjustment, you will need to relativize your variables.  Depending on the analysis that you choose, this may be a built-in automatic procedure, or it may need to be done as a separate step before analysis.

#2100
Did your sample units (rows) have approximately equal area, volume, or otherwise equal effort?
?3013
~73
2
2200, Yes.
2150, No.
For example, say you are counting fish in stream reaches that vary greatly in length or volume of water, but you have no accurate measure of how much habitat is in each of your sample units.  The total number of fish will vary from reach to reach, but this total may have more to do with the amount of habitat than the quality of habitat.  In this case, a long reach has had much more sampling effort than a short reach, resulting in large total numbers in the long reach.  Some data adjustment is usually desirable to reduce this sampling effect.

##2150
You should relativize by sample units (row totals or row sums of squares).
?3013
~73
1
@6 GENERAL RELATIVIZATION

#2170
Do you intend to use a method with an implicit chi-square distance measure? (CA, CCA, DCA)
?8507
~152
2
10, Yes. Now choose an analysis.
2400, No, or I do not know.
The Correspondence analysis family automatically relativizes by sample unit totals.

#2200
Does your main matrix solely contain binary data?
?
~3
3
10, Yes. No relativization needed. Now choose an analysis.
2300, Yes, but I want to consider relativization anyway.
2300, No.
Binary data have only two possible values, zero and one. In community analysis, binary data most often refer to presence (1) and absence (0).  Binary variables can usually be considered to be on the same scale, so relativization is not usually needed.

#2300
Do you wish to equalize weights given to sample units OR focus on relative abundance rather than absolute abundance?
?3013
~70
2
2170, Yes.
2400, No.
The most common use of relativizing by rows (sample units) is to shift the focus of the analysis to relative abundance, rather than absolute abundance.  It can also be used to equalize the weight given to sample units, each sample unit contributing equally to the matrix total or sums of squares.

#2400
Do you wish to equalize weights given to variables OR do you wish to avoid strong emphasis on dominant species? (Assuming species are in columns).
?3013
~73
2
2500, Yes.
10, No.

#2500
Are your variables (columns) already relativized?
?3013
~70
2
10, Yes.  Now choose an analysis.
2550, No.

#2550
Does your main matrix contain species abundances or presence data (columns=species)?
?
~5
2
2600, Yes.
2650, No.
With community data sets the variables (columns) of the main matrix are normally species.  Species data have particular properties: the lowest possible value is typically zero. 

##2600
Relativization by species maximum is recommended.
?3019
~73
1
@7 RELATIVIZE BY COLUMN MAX IN MAIN MATRIX
Relativization by species maximum tends to equalize common and uncommon species.  This is often an effective transformation for community data if you wish to express patterns in the less common species, as well as the common species. You can run this in PC-ORD with Modify Data | Relativizations | Relativization by Maximum.  Each value in the matrix is expressed as a proportion of the maximum value in that column.

#2650
Do you intend to use PCA based on a correlation matrix?
?4038
~116
2
710, Yes.
2700, No, or I do not know.

#2700
You should relativize your columns in some way.  Do you intend to use Sorenson distance or other proportion-based distance?
?3013
~70
2
2750, Yes.
2800, No.

##2750
You should relativize by column totals.
?3018
~73
1
@6 GENERAL RELATIVIZATION
Relativization by column totals makes sense when the use of a proportion-based distance measure, such as Sorenson or Jaccard, is planned.  With species data, however, this relativization tends to give too much emphasis to rare species, amplifying the noise in the data.

#2800
Do you wish to equalize variances or sums of squares of you columns?
?3021
~74
2
2850, Equalize variances: each column will have mean=0 and variance=1.
2900, Equalize sums of squares: each column will have sum(x^2) = 1.
Equalizing variances of columns can be useful prior to applying Euclidean-based distance measures.  Most Euclidean based statistics, however, such as the Pearson correlation coefficient, automatically apply this standardization.  It is also automatically applied in canonical analyses (such as CCA and optionally in RDA) to the predictors, so that they are given equal weight in the analysis.

Equalizing sums of squares is appropriate when relativizing in advance of application of a distance measure that is based on sums of squares.  For example, when calculating Euclidean distances among rows, prior relativization by sums of squares of the rows will give rows equal weight, and the sum of squares of each row will equal one.  The distance measure is then based on differences in relative proportions within the rows.

##2850
Relativize by standard deviates of your columns (Modify data | Relativizations | by standard deviates).
?3021
~74
1
@9 RELATIVIZE BY STANDARD DEVIATES OF COLUMNS
Relativization by standard deviates of the columns does a linear rescaling in each column, such that the column has a mean of zero, a variance of one, and a standard deviation of one.

#2900
Relativize columns by sums of squares (Modify data | Relativization | General relativization | p=2)
?3018
~73
1
@6 GENERAL RELATIVIZATION
Select general relativization with the parameter equal to two.  This equalizes the sums of squares of your columns.
